近年来,抑郁症的发病率在全世界迅速上升,但大规模的抑郁症筛查仍然具有挑战性。步态分析提供了抑郁症的非接触,低成本和高效的早期筛查方法。然而,基于步态分析的抑郁症的早期筛查缺乏足够的有效样本数据。在本文中,我们提出了一种用于评估抑郁症风险的骨架数据增强方法。首先,我们提出了五种技术来增加骨架数据并将其应用于抑郁和情感数据集。然后,我们将增强方法分为两种类型(非噪声增强和噪声增强),基于互信息和分类准确性。最后,我们探索了哪些增强策略可以更有效地捕捉人骨架数据的特征。实验结果表明,保留了更多原始骨架数据属性的增强训练数据集确定了检测模型的性能。具体而言,旋转增强和通道掩码增强使抑郁检测精度分别达到92.15%和91.34%。
translated by 谷歌翻译
人体步态是指不仅代表活动能力的每日运动,而且还可以用人类观察者或计算机来识别步行者。最近的研究表明,步态甚至传达了有关沃克情绪的信息。不同情绪状态中的个体可能显示出不同的步态模式。各种情绪和步态模式之间的映射为自动情绪识别提供了新的来源。与传统的情绪检测生物识别技术(例如面部表达,言语和生理参数)相比,步态是可以观察到的,更难以模仿,并且需要从该主题中进行较少的合作。这些优势使步态成为情感检测的有前途的来源。本文回顾了有关基于步态的情绪检测的当前研究,尤其是关于步态参数如何受到不同情绪状态的影响以及如何通过不同的步态模式识别情绪状态的研究。我们专注于情感识别过程中应用的详细方法和技术:数据收集,预处理和分类。最后,我们讨论了使用智能计算和大数据的最先进技术的状态来讨论高效有效的基于步态的情感识别的可能发展。
translated by 谷歌翻译
由于异质访问点(APS)的性质,负载平衡(LB)是混合灯保真度(LIFI)和无线保真度(WIFI)网络(HLWNETS)的挑战性问题。机器学习有可能以近乎最佳的网络性能为培训过程提供复杂性的LB解决方案。但是,当网络环境(尤其是用户数量)更改时,需要进行最先进的(SOTA)学习辅助LB方法,这大大限制了其实用性。在本文中,提出了一个新颖的深神经网络(DNN)结构,称为自适应目标条件神经网络(A-TCNN),该结构在其他用户的条件下为一个目标用户进行AP选择。此外,开发了一种自适应机制,可以通过分配数据速率要求将较大数量的用户映射到较大的数字,而不会影响目标用户的AP选择结果。这使提出的方法可以处理不同数量的用户,而无需再进行重新培训。结果表明,A-TCNN实现了非常接近测试数据集的网络吞吐量,差距小于3%。还证明,A-TCNN可以获得与两个SOTA基准相当的网络吞吐量,同时最多将运行时降低了三个数量级。
translated by 谷歌翻译
组织病理组织分类是病理学癌症研究的基本任务。精确区分不同的组织类型是下游研究的好处,如癌症诊断,预后等。现有的作品主要利用计算机视觉中的流行分类骨干,以实现组织病理组织分类。在本文中,我们提出了一种超级轻型即插即用模块,名为金字塔深广阔的学习(PDBL),对于任何训练有素的分类骨架,以进一步提高分类性能而无需重新培训负担。我们模仿病理学家如何观察不同放大率的病理学幻灯片,并为输入图像构造图像金字塔,以获得金字塔内部信息。对于金字塔中的每个级别,我们通过我们提出的深层块(DB-Block)提取多种深度广泛的功能。我们用三个流行的分类骨干网,Shufflenetv2,EppositionNetB0和Reset50配备了PDBL,以评估我们建议模块在两个数据集(Kather Multiclass DataSet和LC25000数据集)上的提出模块的有效性和效率。实验结果表明,所提出的PDBL可以稳定地改善任何CNN骨架的组织级分类性能,特别是对于在训练样本(小于10%)中的小型时,特别是轻量级模型,这极大地节省了计算时间和注释工作。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.
translated by 谷歌翻译
In this paper, we propose a novel framework dubbed peer learning to deal with the problem of biased scene graph generation (SGG). This framework uses predicate sampling and consensus voting (PSCV) to encourage different peers to learn from each other, improving model diversity and mitigating bias in SGG. To address the heavily long-tailed distribution of predicate classes, we propose to use predicate sampling to divide and conquer this issue. As a result, the model is less biased and makes more balanced predicate predictions. Specifically, one peer may not be sufficiently diverse to discriminate between different levels of predicate distributions. Therefore, we sample the data distribution based on frequency of predicates into sub-distributions, selecting head, body, and tail classes to combine and feed to different peers as complementary predicate knowledge during the training process. The complementary predicate knowledge of these peers is then ensembled utilizing a consensus voting strategy, which simulates a civilized voting process in our society that emphasizes the majority opinion and diminishes the minority opinion. This approach ensures that the learned representations of each peer are optimally adapted to the various data distributions. Extensive experiments on the Visual Genome dataset demonstrate that PSCV outperforms previous methods. We have established a new state-of-the-art (SOTA) on the SGCls task by achieving a mean of \textbf{31.6}.
translated by 谷歌翻译